75 research outputs found

    A Memory Encryption Engine Suitable for General Purpose Processors

    Get PDF
    Cryptographic protection of memory is an essential ingredient for any technology that allows a closed computing system to run software in a trustworthy manner and handle secrets, while its external memory is susceptible to eavesdropping and tampering. An example for such a technology is Intel\u27s emerging Software Guard Extensions technology (Intel SGX) that appears in the latest processor generation, Architecture Codename Skylake. This technology operates under the assumption that the security perimeter includes only the internals of the CPU package, and in particular, leaves the DRAM untrusted. It is supported by an autonomous hardware unit called the Memory Encryption Engine (MEE), whose role is to protect the confidentiality, integrity, and freshness of the CPU-DRAM traffic over some memory range. To succeed in adding this unit to the micro architecture of a general purpose processor product, it must be designed under very strict engineering constraints. This requires a careful combination of cryptographic primitives operating over a customized integrity tree that mostly resides on the DRAM while relying only on a small internally stored root. The purpose of this paper is to explain how this hardware component of SGX works, and the rationale behind some of its design choices. To this end, we formalize the MEE threat model and security objectives, describe the MEE design, cryptographic properties, security margins, and report some concrete performance results

    A j-lanes tree hashing mode and j-lanes SHA-256

    Get PDF
    j-lanes hashing is a tree mode that splits an input message to j slices, computes j independent digests of each slice, and outputs the hash value of their concatenation. We demonstrate the performance advantage of j-lanes hashing on SIMD architectures, by coding a 4-lanes-SHA-256 implementation and measuring its performance on the latest 3rd Generation Intel® Core™. For message ranging 2KB to 132KB in length, the 4-lanes SHA-256 is between 1.5 to 1.97 times faster than the fastest publicly available implementation (that we are aware of), and between 1.9 to 2.5 times faster than OpenSSL 1.0.1c. For long messages, there is no significant performance difference between different choices of j. We show that the 4-lanes SHA-256 is faster than the two SHA3 finalists (BLAKE and Keccak) that have a published tree mode implementation. We explain why j-lanes hashing will be even faster on the future AVX2 architecture with 256 bits registers. This suggests that standardizing a tree mode for hash functions (SHA-256 in particular) would deliver significant performance benefits for a multitude of algorithms and usages

    Efficient Software Implementations of Modular Exponentiation

    Get PDF
    RSA computations have a significant effect on the workloads of SSL/TLS servers, and therefore their software implementations on general purpose processors are an important target for optimization. We concentrate here on 512-bit modular exponentiation, used for 1024-bit RSA. We propose optimizations in two directions. At the primitives’ level, we study and improve the performance of an “Almost” Montgomery Multiplication. At the exponentiation level, we propose a method to reduce the cost of protecting the w-ary exponentiation algorithm against cache/timing side channel attacks. Together, these lead to an efficient software implementation of 512-bit modular exponentiation, which outperforms the currently fastest publicly available alternative. When measured on the latest x86-64 architecture, the 2nd Generation Intel® Core™ processor, our implementation is 43% faster than that of the current version of OpenSSL (1.0.0d)

    Parallelized hashing via j-lanes and j-pointers tree modes, with applications to SHA-256

    Get PDF
    The j-lanes tree hashing is a tree mode that splits an input message to j slices, computes j independent digests of each slice, and outputs the hash value of their concatenation. The j-pointers tree hashing is a similar tree mode that receives, as input, j pointers to j messages (or slices of a single message), computes their digests and outputs the hash value of their concatenation. Such modes have parallelization capabilities on a hashing process that is serial by nature. As a result, they have performance advantage on modern processor architectures. This paper provides precise specifications for these hashing modes, proposes a setup for appropriate IV’s definition, and demonstrates their performance on the latest processors. Our hope is that it would be useful for standardization of these modes

    Vortex: A New Family of One Way Hash Functions based on Rijndael Rounds and Carry-less Multiplication

    Get PDF
    We present Vortex a new family of one way hash functions that can produce message digests of 224, 256, 384 and 512 bits. The main idea behind the design of these hash functions is that we use well known algorithms that can support very fast diffusion in a small number of steps. We also balance the cryptographic strength that comes from iterating block cipher rounds with SBox substitution and diffusion (like Whirlpool) against the need to have a lightweight implementation with as small number of rounds as possible. We use a variable number of Rijndael rounds with a stronger key schedule. Our goal is not to protect a secret symmetric key but to support perfect mixing of the bits of the input into the hash value. Rijndael rounds are followed by our variant of Galois Field multiplication. This achieves cross-mixing between 128-bit or 256-bit sets. Our hash function uses the Enveloped Merkle-Damgard construction to support properties such as collision resistance, first and second pre-image resistance, pseudorandom oracle preservation and pseudorandom function preservation. We provide analytical results that demonstrate that the number of queries required for finding a collision with probability greater or equal to 0.5 in an ideal block cipher approximation of Vortex 256 is at least 1.18x2^122.55 if the attacker uses randomly selected message words. We also provide experimental results that indicate that the compression function of Vortex is not inferior to that of the SHA family regarding its capability to preserve the pseudorandom oracle property. We list a number of well known attacks and discuss how the Vortex design addresses them. The main strength of the Vortex design is that this hash function can demonstrate an expected performance of 2.2-2.5 cycles per byte in future processors with instruction set support for Rijndael rounds and carry-less multiplication. We provide arguments why we believe this is a trend in the industry. We also discuss how optimized assembly code can be written that demonstrates such performance

    Distinguishing a truncated random permutation from a random function

    Get PDF
    An oracle chooses a function f from the set of n bits strings to itself, which is either a randomly chosen permutation or a randomly chosen function. When queried by an n-bit string w, the oracle computes f(w), truncates the m last bits, and returns only the first n-m bits of f(w). How many queries does a querying adversary need to submit in order to distinguish the truncated permutation from a random function? In 1998, Hall et al. showed an algorithm for determining (with high probability) whether or not f is a permutation, using O ( 2^((m+n)/2) ) queries. They also showed that if m n/7, their method gives a weaker bound. In this manuscript, we show how a modification of the method used by Hall et al. can solve the porblem completely. It extends the result to essentially every m, showing that Omega ( 2^((m+n)/2) ) queries are needed to get a non-negligible distinguishing advantage. We recently became aware that a better bound for the distinguishing advantage, for every m<n, follows from a result of Stam published, in a different context, already in 1978

    SPHINCS-Simpira: Fast Stateless Hash-based Signatures with Post-quantum Security

    Get PDF
    We introduce SPHINCS-Simpira, which is a variant of the SPHINCS signature scheme with Simpira as a building block. SPHINCS was proposed by Bernstein et al. at EUROCRYPT 2015 as a hash-based signature scheme with post-quantum security. At ASIACRYPT 2016, Gueron and Mouha introduced the Simpira family of cryptographic permutations, which delivers high throughput on modern 64-bit processors by using only one building block: the AES round function. The Simpira family claims security against structural distinguishers with a complexity up to 2^128 using classical computers. In this document, we explain why the same claim can be made against quantum computers as well. Although Simpira follows a very conservative design strategy, our benchmarks show that SPHINCS-Simpira provides a 1.5x speed-up for key generation, a 1.4x speed-up for signing 59-byte messages, and a 2.0x speed-up for verifying 59-byte messages compared to the originally proposed SPHINCS-256

    Security Enhancement of the Vortex Family of Hash Functions

    Get PDF
    Vortex is a new family of one-way hash functions which has been submitted to the NIST SHA-3 competition. Its design is based on using the Rijndael block cipher round as a building block, and using a multiplication-based merging function to support fast mixing in a small number of steps. Vortex is designed to be a fast hash function, when running on a processor that has AES acceleration and has a proven collision resistance [2]. Several attacks on Vortex have been recently published [3, 4, 5, 6] exploiting some structural properties of its design, as presented in the version submitted to the SHA-3 competition. These are mainly ¯rst and second preimage attacks with time complexity below the ideal, as well as attempts to distinguish the Vortex output from random. In this paper we study the root-cause of the attacks and propose few amendments to the Vortex structure, which eliminate the attacks without a®ecting its collision resistance and performance
    • …
    corecore